Dataset statistics
| Number of variables | 31 |
|---|---|
| Number of observations | 1033 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 250.3 KiB |
| Average record size in memory | 248.1 B |
Variable types
| Boolean | 18 |
|---|---|
| Categorical | 4 |
| Numeric | 9 |
First Sit is highly correlated with Second Sit | High correlation |
Second Sit is highly correlated with First Sit | High correlation |
Fails is highly correlated with Pass | High correlation |
Pass is highly correlated with Fails | High correlation |
English is highly correlated with Maths | High correlation |
Maths is highly correlated with English | High correlation |
First Sit is highly correlated with Second Sit | High correlation |
Second Sit is highly correlated with First Sit | High correlation |
Fails is highly correlated with Pass | High correlation |
Pass is highly correlated with Fails | High correlation |
First Sit is highly correlated with Second Sit | High correlation |
Second Sit is highly correlated with First Sit | High correlation |
Fails is highly correlated with Pass | High correlation |
Pass is highly correlated with Fails | High correlation |
Btec is highly correlated with A Levels | High correlation |
SLC is highly correlated with Student Visa | High correlation |
Bursary is highly correlated with Polar_4_Score | High correlation |
desertion is highly correlated with Progression | High correlation |
Progression is highly correlated with desertion | High correlation |
British is highly correlated with Student Visa | High correlation |
A Levels is highly correlated with Btec | High correlation |
Polar_4_Score is highly correlated with Bursary | High correlation |
Student Visa is highly correlated with SLC and 1 other fields | High correlation |
UCAS is highly correlated with 25 Above and 1 other fields | High correlation |
25 Above is highly correlated with UCAS | High correlation |
Disability is highly correlated with Bursary | High correlation |
British is highly correlated with English native Language and 3 other fields | High correlation |
English native Language is highly correlated with British | High correlation |
Polar_4_Score is highly correlated with Bursary and 1 other fields | High correlation |
SLC is highly correlated with British and 1 other fields | High correlation |
Care Leaver is highly correlated with Refugee | High correlation |
Student Visa is highly correlated with British and 2 other fields | High correlation |
Refugee is highly correlated with Care Leaver | High correlation |
London Permanent Residence is highly correlated with British and 1 other fields | High correlation |
UCAS Points is highly correlated with English | High correlation |
English is highly correlated with UCAS Points and 1 other fields | High correlation |
Maths is highly correlated with English | High correlation |
A Levels is highly correlated with Btec | High correlation |
Btec is highly correlated with A Levels | High correlation |
Bursary is highly correlated with Disability and 1 other fields | High correlation |
Attendance is highly correlated with Progression and 1 other fields | High correlation |
Progression is highly correlated with Attendance and 3 other fields | High correlation |
First Sit is highly correlated with Second Sit and 2 other fields | High correlation |
Second Sit is highly correlated with First Sit and 1 other fields | High correlation |
Fails is highly correlated with Progression and 2 other fields | High correlation |
No Submissions is highly correlated with First Sit and 2 other fields | High correlation |
Pass is highly correlated with Progression and 2 other fields | High correlation |
Re Takes is highly correlated with No Submissions | High correlation |
desertion is highly correlated with UCAS and 6 other fields | High correlation |
Second Sit has 216 (20.9%) zeros | Zeros |
Fails has 850 (82.3%) zeros | Zeros |
No Submissions has 423 (40.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-02 17:48:07.334244 |
|---|---|
| Analysis finished | 2022-09-02 17:48:24.545679 |
| Duration | 17.21 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 938 | |
| False | 95 | 9.2% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 872 | |
| True | 161 | 15.6% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True | 66 |
| Value | Count | Frequency (%) |
| False | 967 | |
| True | 66 | 6.4% |
Ethnicity
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.2 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1033 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1033 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1033 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1033 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 603 | |
| 1 | 430 |
Gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.2 KiB |
| Male | |
|---|---|
| Female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.762826718 |
| Min length | 4 |
Characters and Unicode
| Total characters | 4920 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Male |
| 4th row | Female |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 639 | |
| Female | 394 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| male | 639 | |
| female | 394 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1427 | |
| a | 1033 | |
| l | 1033 | |
| M | 639 | |
| F | 394 | 8.0% |
| m | 394 | 8.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3887 | |
| Uppercase Letter | 1033 | 21.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1427 | |
| a | 1033 | |
| l | 1033 | |
| m | 394 | 10.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 639 | |
| F | 394 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4920 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1427 | |
| a | 1033 | |
| l | 1033 | |
| M | 639 | |
| F | 394 | 8.0% |
| m | 394 | 8.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4920 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1427 | |
| a | 1033 | |
| l | 1033 | |
| M | 639 | |
| F | 394 | 8.0% |
| m | 394 | 8.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 650 | |
| False | 383 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 566 | |
| False | 467 |
Parent He attendance
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 572 | |
| True | 461 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.2 KiB |
| 0.0 | |
|---|---|
| 1.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3099 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 789 | |
| 1.0 | 244 | 23.6% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 789 | |
| 1.0 | 244 | 23.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1822 | |
| . | 1033 | |
| 1 | 244 | 7.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2066 | |
| Other Punctuation | 1033 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1822 | |
| 1 | 244 | 11.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1033 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3099 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1822 | |
| . | 1033 | |
| 1 | 244 | 7.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3099 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1822 | |
| . | 1033 | |
| 1 | 244 | 7.9% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 734 | |
| False | 299 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True | 19 |
| Value | Count | Frequency (%) |
| False | 1014 | |
| True | 19 | 1.8% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 878 | |
| True | 155 | 15.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True | 24 |
| Value | Count | Frequency (%) |
| False | 1009 | |
| True | 24 | 2.3% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 573 | |
| False | 460 |
| Distinct | 60 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 108.8460794 |
| Minimum | 72 |
|---|---|
| Maximum | 168 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 72 |
|---|---|
| 5-th percentile | 82 |
| Q1 | 96 |
| median | 104 |
| Q3 | 119 |
| 95-th percentile | 152 |
| Maximum | 168 |
| Range | 96 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 19.70299151 |
|---|---|
| Coefficient of variation (CV) | 0.181017007 |
| Kurtosis | 1.082275217 |
| Mean | 108.8460794 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 1.042881347 |
| Sum | 112438 |
| Variance | 388.2078746 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 104 | 105 | 10.2% |
| 96 | 84 | 8.1% |
| 128 | 47 | 4.5% |
| 120 | 36 | 3.5% |
| 80 | 36 | 3.5% |
| 112 | 35 | 3.4% |
| 84 | 35 | 3.4% |
| 100 | 33 | 3.2% |
| 88 | 33 | 3.2% |
| 103 | 30 | 2.9% |
| Other values (50) | 559 |
| Value | Count | Frequency (%) |
| 72 | 4 | 0.4% |
| 80 | 36 | |
| 82 | 22 | |
| 84 | 35 | |
| 85 | 1 | 0.1% |
| 86 | 10 | 1.0% |
| 87 | 5 | 0.5% |
| 88 | 33 | |
| 89 | 7 | 0.7% |
| 90 | 6 | 0.6% |
| Value | Count | Frequency (%) |
| 168 | 25 | |
| 162 | 5 | 0.5% |
| 160 | 8 | 0.8% |
| 155 | 1 | 0.1% |
| 153 | 8 | 0.8% |
| 152 | 15 | |
| 148 | 6 | 0.6% |
| 146 | 4 | 0.4% |
| 144 | 18 | |
| 136 | 6 | 0.6% |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.936108422 |
| Minimum | 2 |
|---|---|
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 8 |
| Maximum | 9 |
| Range | 7 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.273531665 |
|---|---|
| Coefficient of variation (CV) | 0.2580031791 |
| Kurtosis | 0.9445088077 |
| Mean | 4.936108422 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7882962078 |
| Sum | 5099 |
| Variance | 1.621882903 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 5 | 416 | |
| 4 | 290 | |
| 6 | 120 | 11.6% |
| 3 | 82 | 7.9% |
| 8 | 53 | 5.1% |
| 7 | 51 | 4.9% |
| 2 | 11 | 1.1% |
| 9 | 10 | 1.0% |
| Value | Count | Frequency (%) |
| 2 | 11 | 1.1% |
| 3 | 82 | 7.9% |
| 4 | 290 | |
| 5 | 416 | |
| 6 | 120 | 11.6% |
| 7 | 51 | 4.9% |
| 8 | 53 | 5.1% |
| 9 | 10 | 1.0% |
| Value | Count | Frequency (%) |
| 9 | 10 | 1.0% |
| 8 | 53 | 5.1% |
| 7 | 51 | 4.9% |
| 6 | 120 | 11.6% |
| 5 | 416 | |
| 4 | 290 | |
| 3 | 82 | 7.9% |
| 2 | 11 | 1.1% |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.80929332 |
| Minimum | 2 |
|---|---|
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 9 |
| Range | 7 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.104707503 |
|---|---|
| Coefficient of variation (CV) | 0.2297026672 |
| Kurtosis | 1.334057792 |
| Mean | 4.80929332 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.5204563193 |
| Sum | 4968 |
| Variance | 1.220378667 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 5 | 417 | |
| 4 | 345 | |
| 6 | 124 | 12.0% |
| 7 | 60 | 5.8% |
| 3 | 46 | 4.5% |
| 2 | 22 | 2.1% |
| 8 | 14 | 1.4% |
| 9 | 5 | 0.5% |
| Value | Count | Frequency (%) |
| 2 | 22 | 2.1% |
| 3 | 46 | 4.5% |
| 4 | 345 | |
| 5 | 417 | |
| 6 | 124 | 12.0% |
| 7 | 60 | 5.8% |
| 8 | 14 | 1.4% |
| 9 | 5 | 0.5% |
| Value | Count | Frequency (%) |
| 9 | 5 | 0.5% |
| 8 | 14 | 1.4% |
| 7 | 60 | 5.8% |
| 6 | 124 | 12.0% |
| 5 | 417 | |
| 4 | 345 | |
| 3 | 46 | 4.5% |
| 2 | 22 | 2.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 579 | |
| False | 454 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 654 | |
| True | 379 |
Previous work
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 541 | |
| False | 492 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 787 | |
| True | 246 | 23.8% |
| Distinct | 63 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 75.08712488 |
| Minimum | 20 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 46 |
| Q1 | 64 |
| median | 76 |
| Q3 | 88 |
| 95-th percentile | 97 |
| Maximum | 100 |
| Range | 80 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 15.73841886 |
|---|---|
| Coefficient of variation (CV) | 0.2096020974 |
| Kurtosis | -0.6273441074 |
| Mean | 75.08712488 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | -0.3975210015 |
| Sum | 77565 |
| Variance | 247.6978283 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 60 | 34 | 3.3% |
| 92 | 31 | 3.0% |
| 95 | 29 | 2.8% |
| 74 | 28 | 2.7% |
| 96 | 27 | 2.6% |
| 90 | 27 | 2.6% |
| 81 | 27 | 2.6% |
| 72 | 26 | 2.5% |
| 65 | 25 | 2.4% |
| 94 | 25 | 2.4% |
| Other values (53) | 754 |
| Value | Count | Frequency (%) |
| 20 | 1 | 0.1% |
| 25 | 1 | 0.1% |
| 40 | 6 | |
| 41 | 6 | |
| 42 | 14 | |
| 43 | 3 | 0.3% |
| 44 | 8 | |
| 45 | 12 | |
| 46 | 7 | |
| 47 | 10 |
| Value | Count | Frequency (%) |
| 100 | 15 | |
| 99 | 15 | |
| 98 | 20 | |
| 97 | 15 | |
| 96 | 27 | |
| 95 | 29 | |
| 94 | 25 | |
| 93 | 13 | |
| 92 | 31 | |
| 91 | 16 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 850 | |
| False | 183 | 17.7% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.019361084 |
| Minimum | 1 |
|---|---|
| Maximum | 6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.304291061 |
|---|---|
| Coefficient of variation (CV) | 0.3245020873 |
| Kurtosis | -0.7064480272 |
| Mean | 4.019361084 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.02188714218 |
| Sum | 4152 |
| Variance | 1.701175173 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 4 | 219 | |
| 6 | 190 | |
| 5 | 183 | |
| 2 | 36 | 3.5% |
| 1 | 33 | 3.2% |
| Value | Count | Frequency (%) |
| 1 | 33 | 3.2% |
| 2 | 36 | 3.5% |
| 3 | 372 | |
| 4 | 219 | |
| 5 | 183 | |
| 6 | 190 |
| Value | Count | Frequency (%) |
| 6 | 190 | |
| 5 | 183 | |
| 4 | 219 | |
| 3 | 372 | |
| 2 | 36 | 3.5% |
| 1 | 33 | 3.2% |
Second Sit
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.818973863 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 216 |
| Zeros (%) | 20.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.256436187 |
|---|---|
| Coefficient of variation (CV) | 0.690739 |
| Kurtosis | -0.8238500089 |
| Mean | 1.818973863 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.002015094775 |
| Sum | 1879 |
| Variance | 1.578631892 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 3 | 348 | |
| 2 | 232 | |
| 0 | 216 | |
| 1 | 199 | |
| 5 | 20 | 1.9% |
| 4 | 18 | 1.7% |
| Value | Count | Frequency (%) |
| 0 | 216 | |
| 1 | 199 | |
| 2 | 232 | |
| 3 | 348 | |
| 4 | 18 | 1.7% |
| 5 | 20 | 1.9% |
| Value | Count | Frequency (%) |
| 5 | 20 | 1.9% |
| 4 | 18 | 1.7% |
| 3 | 348 | |
| 2 | 232 | |
| 1 | 199 | |
| 0 | 216 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5614714424 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 850 |
| Zeros (%) | 82.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.308272192 |
|---|---|
| Coefficient of variation (CV) | 2.330077886 |
| Kurtosis | 3.627648736 |
| Mean | 0.5614714424 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.21727455 |
| Sum | 580 |
| Variance | 1.711576127 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 850 | |
| 2 | 52 | 5.0% |
| 3 | 50 | 4.8% |
| 4 | 39 | 3.8% |
| 5 | 32 | 3.1% |
| 1 | 10 | 1.0% |
| Value | Count | Frequency (%) |
| 0 | 850 | |
| 1 | 10 | 1.0% |
| 2 | 52 | 5.0% |
| 3 | 50 | 4.8% |
| 4 | 39 | 3.8% |
| 5 | 32 | 3.1% |
| Value | Count | Frequency (%) |
| 5 | 32 | 3.1% |
| 4 | 39 | 3.8% |
| 3 | 50 | 4.8% |
| 2 | 52 | 5.0% |
| 1 | 10 | 1.0% |
| 0 | 850 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.234269119 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 423 |
| Zeros (%) | 40.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 4 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.363742454 |
|---|---|
| Coefficient of variation (CV) | 1.104898788 |
| Kurtosis | -0.06660940015 |
| Mean | 1.234269119 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.9492579822 |
| Sum | 1275 |
| Variance | 1.859793482 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 423 | |
| 1 | 253 | |
| 2 | 165 | 16.0% |
| 3 | 96 | 9.3% |
| 4 | 76 | 7.4% |
| 5 | 20 | 1.9% |
| Value | Count | Frequency (%) |
| 0 | 423 | |
| 1 | 253 | |
| 2 | 165 | 16.0% |
| 3 | 96 | 9.3% |
| 4 | 76 | 7.4% |
| 5 | 20 | 1.9% |
| Value | Count | Frequency (%) |
| 5 | 20 | 1.9% |
| 4 | 76 | 7.4% |
| 3 | 96 | 9.3% |
| 2 | 165 | 16.0% |
| 1 | 253 | |
| 0 | 423 |
Late Submission
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.2 KiB |
| 1 | |
|---|---|
| 0 | |
| 2 | |
| 3 | 25 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1033 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1033 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1033 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1033 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 424 | |
| 0 | 409 | |
| 2 | 175 | |
| 3 | 25 | 2.4% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 91.6934495 |
| Minimum | 16.66666667 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 16.66666667 |
|---|---|
| 5-th percentile | 33.33333333 |
| Q1 | 100 |
| median | 100 |
| Q3 | 100 |
| 95-th percentile | 100 |
| Maximum | 100 |
| Range | 83.33333333 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 19.74286232 |
|---|---|
| Coefficient of variation (CV) | 0.2153137703 |
| Kurtosis | 3.913579725 |
| Mean | 91.6934495 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.287006307 |
| Sum | 94719.33333 |
| Variance | 389.7806127 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 100 | 850 | |
| 33.33333333 | 51 | 4.9% |
| 50 | 50 | 4.8% |
| 66.66666667 | 39 | 3.8% |
| 83.33333333 | 32 | 3.1% |
| 16.66666667 | 10 | 1.0% |
| 86 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 16.66666667 | 10 | 1.0% |
| 33.33333333 | 51 | 4.9% |
| 50 | 50 | 4.8% |
| 66.66666667 | 39 | 3.8% |
| 83.33333333 | 32 | 3.1% |
| 86 | 1 | 0.1% |
| 100 | 850 |
| Value | Count | Frequency (%) |
| 100 | 850 | |
| 86 | 1 | 0.1% |
| 83.33333333 | 32 | 3.1% |
| 66.66666667 | 39 | 3.8% |
| 50 | 50 | 4.8% |
| 33.33333333 | 51 | 4.9% |
| 16.66666667 | 10 | 1.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 878 | |
| True | 155 | 15.0% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| UCAS | 25 Above | Disability | Ethnicity | Gender | British | English native Language | Parent He attendance | Polar_4_Score | SLC | Care Leaver | Student Visa | Refugee | London Permanent Residence | UCAS Points | English | Maths | A Levels | Btec | Previous work | Bursary | Attendance | Progression | First Sit | Second Sit | Fails | No Submissions | Late Submission | Pass | Re Takes | desertion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | no | no | no | 1 | Male | no | no | yes | 0.0 | no | no | yes | no | yes | 98.0 | 5.0 | 4.0 | yes | no | yes | no | 86 | yes | 3 | 3.0 | 0 | 2 | 2 | 100.000000 | yes | no |
| 1 | no | no | no | 0 | Male | no | no | yes | 1.0 | yes | no | no | no | no | 101.0 | 5.0 | 5.0 | yes | no | yes | yes | 55 | no | 1 | 2.0 | 5 | 3 | 0 | 83.333333 | no | yes |
| 2 | no | no | no | 1 | Male | yes | yes | yes | 0.0 | yes | no | no | no | yes | 129.0 | 4.0 | 4.0 | yes | no | yes | no | 57 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | yes |
| 3 | no | yes | no | 0 | Female | no | no | no | 0.0 | yes | no | no | no | yes | 110.0 | 9.0 | 8.0 | yes | no | yes | no | 48 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | yes |
| 4 | no | no | no | 0 | Male | yes | yes | yes | 0.0 | yes | no | no | no | yes | 130.0 | 6.0 | 5.0 | yes | no | yes | no | 83 | yes | 4 | 2.0 | 0 | 2 | 0 | 100.000000 | no | no |
| 5 | yes | no | no | 1 | Male | yes | yes | yes | 0.0 | yes | no | no | no | yes | 112.0 | 6.0 | 4.0 | no | yes | no | no | 71 | yes | 3 | 3.0 | 0 | 0 | 1 | 100.000000 | no | no |
| 6 | yes | no | no | 0 | Male | no | yes | no | 0.0 | no | no | yes | no | no | 89.0 | 6.0 | 5.0 | yes | no | no | no | 96 | yes | 4 | 2.0 | 0 | 0 | 2 | 100.000000 | no | no |
| 7 | yes | no | no | 0 | Male | yes | yes | no | 0.0 | yes | no | no | no | yes | 103.0 | 4.0 | 5.0 | yes | no | no | no | 67 | yes | 3 | 3.0 | 0 | 3 | 0 | 100.000000 | no | no |
| 8 | yes | no | no | 0 | Male | yes | yes | no | 1.0 | no | no | no | no | yes | 128.0 | 4.0 | 4.0 | no | yes | no | yes | 89 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | no |
| 9 | yes | no | no | 0 | Female | yes | yes | no | 0.0 | no | no | no | no | no | 91.0 | 4.0 | 4.0 | no | no | no | no | 92 | yes | 6 | 0.0 | 0 | 1 | 1 | 100.000000 | no | no |
Last rows
| UCAS | 25 Above | Disability | Ethnicity | Gender | British | English native Language | Parent He attendance | Polar_4_Score | SLC | Care Leaver | Student Visa | Refugee | London Permanent Residence | UCAS Points | English | Maths | A Levels | Btec | Previous work | Bursary | Attendance | Progression | First Sit | Second Sit | Fails | No Submissions | Late Submission | Pass | Re Takes | desertion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1023 | yes | no | no | 1 | Male | no | no | yes | 0.0 | yes | no | no | no | no | 107.0 | 6.0 | 7.0 | no | yes | no | no | 96 | yes | 6 | 0.0 | 0 | 0 | 1 | 100.000000 | no | no |
| 1024 | yes | yes | no | 0 | Male | yes | yes | no | 0.0 | yes | no | no | no | no | 103.0 | 5.0 | 6.0 | no | yes | yes | no | 67 | yes | 1 | 5.0 | 0 | 3 | 0 | 100.000000 | no | no |
| 1025 | yes | no | no | 0 | Male | yes | yes | yes | 0.0 | yes | no | no | no | no | 100.0 | 5.0 | 4.0 | yes | no | yes | no | 70 | yes | 6 | 0.0 | 0 | 0 | 1 | 100.000000 | no | no |
| 1026 | yes | yes | no | 0 | Female | no | no | yes | 0.0 | yes | no | no | no | no | 113.0 | 3.0 | 6.0 | no | yes | no | no | 64 | yes | 3 | 3.0 | 0 | 2 | 1 | 100.000000 | no | no |
| 1027 | yes | yes | no | 0 | Male | yes | no | yes | 1.0 | yes | no | no | no | yes | 118.0 | 5.0 | 5.0 | yes | no | yes | yes | 96 | yes | 3 | 3.0 | 0 | 1 | 0 | 100.000000 | no | no |
| 1028 | yes | no | no | 1 | Female | no | yes | no | 1.0 | yes | no | no | no | no | 102.0 | 4.0 | 4.0 | yes | no | yes | no | 55 | yes | 6 | 0.0 | 0 | 0 | 1 | 100.000000 | no | yes |
| 1029 | yes | no | no | 0 | Male | yes | yes | yes | 0.0 | yes | no | no | no | yes | 109.0 | 4.0 | 4.0 | yes | no | yes | no | 66 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | no |
| 1030 | no | yes | no | 1 | Female | no | no | no | 1.0 | no | no | no | no | no | 104.0 | 6.0 | 5.0 | yes | no | yes | no | 42 | no | 1 | 1.0 | 2 | 4 | 1 | 33.333333 | yes | yes |
| 1031 | no | yes | no | 1 | Male | no | no | yes | 0.0 | yes | no | no | no | no | 101.0 | 6.0 | 6.0 | no | yes | no | no | 60 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | no |
| 1032 | no | yes | no | 1 | Female | no | no | no | 0.0 | no | no | yes | no | no | 104.0 | 8.0 | 4.0 | no | no | no | no | 71 | yes | 6 | 0.0 | 0 | 0 | 0 | 100.000000 | no | no |